Medium effects on the selection of sequences folding 
into stable proteins in a simple model 
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We study the medium effects on tfie selection of sequences in protein folding by taking account 
of the surface potential in H P-model. Our analysis on the proportion of H and P monomers in the 
sequences gives a direct interpretation that the lowly designable structures possess small average 
gap. The numerical calculation by means of our model exhibits that the surface potential enhances 
the average gap of highly designable structures. It also shows that a most stable structure may be 
no longer the most stable one if the medium parameters changed. 
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Proteins are known to play a virtual role in the struc- 
ture and functioning of all forms of life, and the pro- 
tein folding problem is one of the most fundamental and 
still unsolved problems. Composed of a specific sequence 
of amino acids, each protein is folded into native struc- 
ture (a particular 3-dimensional shape) that determines 
its biological function and it is widely believed that for 
most single domain proteins, the native structure is the 
global free-energy minimum[l|. The amino-acid sequence 
alone encodes sufficient 0] information to determine its 
3-d structure. Theoretical studies on protein sequence 
and structure include molecular dynamical simulation '5| 
and lattice model(3(|. The latter has absorbed much 
attention^ |^ while the former takes much CPU even 
on huge computers 

For the naturally occurring varieties of amino acids 
can be classified 6J as either of hydrophobic(H) or of po- 
lar (P), a HP-lattice model to interpret protein folding 
was introduced Based on the called standard HP 
model, 27 monomers occupying all sites of a cubicQ, 
Li et al.'7'l introduced the designability to show that po- 
tentially good sequences are those with a unique ground 
state separated by a large gap from the first excited state. 
By defining the designability of a structure as the num- 
ber of sequences that possess the structure as their unique 
lowest-energy state, they found that the structures differ 
drastically in their designabilities. The sequences that 
design the highly designable structures are thermody- 
namically more stable[3,l3- Studies on the designabil- 
ity for a larger lattice modelQ and for an off-lattice 
model[l3 showed the similar results. For many-letter 
models, the different parameters gave different results: 
Buchler et al.^j got that the designability of the struc- 
ture depends sensitively on the size of the alphabet, and 
Li et al.^3| achieved that the designability of the struc- 
ture is not sensitive to the alphabet size when a realistic 
interaction potential(MJ matrix) is employed. Ejtehadi 
et al. found that if the strength of the non-additive part 
of the interaction potential becomes larger than a crit- 



ical value, the degree of designability of structures will 
depend on the parameters of the potential 13]. 

Since useful features concerning to the protein folding 
and their stability can be explored on the basis of lattice 
model, it will be worthwhile to study the effect of media 
on protein folding properties. In this letter, we consider 
the medium effects by introducing different parameters 
that characterize various concentrations of medium so- 
lution. Our results give some answers to the following 
questions. Namely, are those sequences associated with 
highly designable structures universally good? how do 
they vary depending on media[T^ where the protein is 
placed? 

We investigate the effects of media upon the category 
of highly designable protein sequences, which will un- 
doubtedly provide a clue to understand the variations in 
the nature selection of protein species caused by media 
where the protein lives. For this purpose, we must recon- 
struct the original HP model by introducing potential 
parameters to the monomers at protein's surface. The 
protein is figured as a chain of beads occupying the sites 
of a lattice in a self-avoiding way, so our model evalu- 
ating the energy of a sequence folded into a particular 
structure reads, 
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where i, j denote for the successive labels of monomers 
in a sequence, for the position (of the i-th monomer) 
on the lattice sites, and at refers H or P correspond- 
ing to hydrophobic or polar monomer. Here the Kro- 
necker delta notation is adopted, i.e., Sa,b = 1 if a=b but 
5a. b = if a 7^ 6. As the hydrophobic force 'S] drives pro- 
tein to fold into a compact shape with more hydropho- 
bic monomers inside as possible, the HH contacts are 
more favorite in this model, which can be characterized 
by choosing Epp — 0, Ehp — —1, and Ehh = —2.3 as 
adopted in Ref.0- In order to include the effects caused 
by the protein's surrounding medium that is relevant to 
salt concentration 14J of a solution where the protein is 
placed, we introduce [/y, Ue, and Up to represent the 
attractive potentials in the protein surface for polar (hy- 
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drophilic) monomers at vertices, edges, or face centers re- 
spectively. These attractive forces arise from the medimxi 
(solution) to the hydrophilic monomers. Since we are 
not able to deal with a sphere surface in present lattice 
model, we consider different weights at the surface, say- 
ing Ut = —JtV. If 7v = 7B = 7F 7^ 0, no any new 
results occur in comparison to the result that Li et al. 
had studied. This is because the core in the cubic of the 
27-site model always contains a hydrophobic core, which 
implies that the surface potentials merely cause a global 
shift in energy spectrum of the 27-site model if we impose 
an equal weights on a vertex, edge as well as center of a 
face. We then investigate several cases of non- vanishing, 
7's later on. 

It has been noticed^ that some structures can be de- 
signed by a large number of sequences, while the oth- 
ers can be designed by only few sequences. The des- 
ignability of a structure is measured by the number (A'^s) 
of sequences that take the given structure as their unique 
ground state, as was first introduced by Li et al.|3|- Ad- 
ditionally, structures differ drastically according to their 
designability, i.e., highly designable structures emerge 
with a number of associated sequences much larger than 
the average ones. For a particular sequence, the en- 
ergy gap Ss is the minimum energy needed to change 
its ground-state structure into a different compact struc- 
ture. The average energy gap Ss for a given structure is 
evaluated by averaging the gaps over all the Ng sequences 
that design that structure. The structures with large Ns 
have much larger average gap than those with small N^, 
and there is an apparent jump around Ns = 1400 in 
the average energy gap. This feature was first noticed 
by Li et al.|7| in the medium-independent HP model, 
thus these highly designable structures are thcrmody- 
namically more stable and possess protein-like secondary 
structures into which the protein sequences fold faster 
than the other structures [3 . To interpret this feature, 
we calculate the average distribution of the number of 
hydrophobic monomers for the highly designable struc- 
tures and for the lowly designable structures respectively. 
We plot these two distributions together with the pure 
mathematical binary arrangement distribution in Fig. 
where all distributions are normalized to unit. Clearly, 
the distributions for highly designable structures shift 
toward the larger number of hydrophobic monomers in 
comparison to the mathematical distribution. This leads 
to a lower energy scale because the more hydrophobic 
monomers there are, the lower their energy will be. Op- 
positely, the distribution for lowly designable structures 
shift toward the small number of hydrophobic monomers 
in comparison to the mathematical distribution, which 
causes a higher energy. This may interpret that the lowly 
designable structures possess small average gap. 

Although the choices of Epp = 0, Ehp = —1, and 
Ehh = —2.3 adopted in Ref.Q fulfil the principle that 
the major driving force for protein folding is the hy- 
drophobic force, the difference between the H-H contacts 
occurring inside protein and that occurring at surface 
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FIG. 1: Comparison of distributions for binary arrangement 
(green dot line) , the lowly designable structures (red dash-dot 
line), and the highly designable structures (black solid line) 
respectively. 



was disregarded. Therefore, to explore the designabil- 
ity affected by the medium surrounding the protein, the 
application of surface potential in our model becomes 
inevitable. We pointed out in the above that the 26 
monomers are on the surface for 27-site model, which 
gave trivial result for uniform weights to the surface po- 
tential. On the other hand, increasing the number of the 
lattice sites will make the model beyond the calculation 
capacity of nowadays computers. However, after some 
further tuning the original model, we are able to obtain 
nontrivial and interesting results. First, we consider a 
"cubic shape approximation" by imposing different po- 
tential weights: = 7/8, = 6/8, and = 4/8, 
which come from the different interfaces between the 
medium solution and the monomers at vertex, edge and 
the face centre respectively. For this parameter choice, 
we find there are 17 more sequences possessing unique 
ground state regardless of the magnitudes of V (rang- 
ing from 0.1 to 2.1) though they do not possess unique 
ground states in the model studied by Li et al where the 
effect of medium was neg 

lected^- 

Our calculation fur- 
ther exposes that 14 of those 17 sequences mainly belong 
to the highly designable structures, and have relatively 
larger energy gap. We analyse all the 17 sequences, and 
find that the 14 ones can be related to each other by a 
single mutation, which implies that they be long to the 
"neutral island" suggested by Trinquier et al. [13 • These 
results confirm that protein structures are selected in na- 
ture because they are readily designed and stable against 
mutations, and that such a selection simultaneously leads 
to thermodynamic stability and foldability. Thus, a key 
point to understand the protein-folding problem is to un- 
derstand the emergence and the properties of highly des- 
ignable structures. 

The second parametrization is to consider 7y = 7/8, 
7_B — 6/8, and — 0, which models a protein with 
7 monomers at the inside while 20 ones at surface. In 
this case, we find there are 48 more sequences possess- 
ing unique ground state for a wider range of magnitudes 
of V (from 0.0001 to 2.1), which, however, have none 
unique ground states in the case of Li et al.j3- Whereas, 



3 



2.8 
2.4 
Q. 2.0- 

CD 

" 1.6 

03 

i'l.2 

03 

i 0.8 
0.4 
0.0 



1000 2000 
Ns 



2.8 

2.4 

Q.2.0 
n] 

t3 1.6 

(D 
O) 

2 1.2 
^ 0.8 
0.4 
0.0 



2000 
Ns 



3000 4000 





2.8- 




2.4- 


Q. 


2.0- 


ni 




O 


1.6 


(D 
cn 


1.2 


(0 






> 
< 


0.8 




0.4- 




0.0- 



1000 2000 
Ns 



FIG. 2; Average gap of structures versus Ns of the structures in the case of 7y = 7/8, 7b = 6/8, 7i? = for (a) V = 0.0001, 
(b)V = 0.9, and {c)V = 2.1, respectively. 
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FIG. 3: The largest average gap Smax versus the parameter V: (a) for 7v = 7/8, 7e = 6/8, 7f = 4/8; (b)for = 7/8, 
7b = 6/8, 7f = 0; (c) for -yv = 1, 7b = 1, 7f = case. 



only one sequence designs the highly designable struc- 
ture while the other 47 sequences design lowly designable 
structures. All the energy gaps of those new sequences 
are found to be V/8. Since the ratio of the numbers of 
the monomers at surface to that at the inside is of or- 
der 1 in natural proteinsfsj, and the ratio in our model 
is 26:1 in first case but is 20:7 in the second case, the 
latter case ought to be closer to the usual natural pro- 
teins. Fig. 13 shows the average energy gap for different 
potential parameters. Clearly, the surface potential en- 
hances the average gap of highly designable structures, 
which illustrates that the highly designable structures se- 
lected by nature are more stable in proper media than in 
"vacuum" . Recent experiment 1^] revealed that the addi- 
tional stability of a thermophilic protein comes from just 
a few residues at the protein surface. Thus our theoreti- 
cal results may evoke more attention to the dependence 
of stability on medium effects in further model studies. 

We calculate the case by assuming the potentials at 
the vertices and at edges with the same weights, i.e., 
7v = 1, 7b = 1, and 7f = 0. We find that there is 
no sequence beyond those of Ref.0 to take the hi ghly 
designable structures. Just like the result in Ref.[lJ|, 
there are also 60 structures that possess large average 
gap. When we take account of the effects of medium, the 
average gap for highly designable structures increase ap- 
parently as the potential parameter increases, but the av- 



erage gap of lowly designable structures does not change 
much. In all the aforementioned cases, the average gap 
of a single highly designable structure increases linearly 
with respect to the increase of V . Furthermore, we find 
the structure with largest average gap is not fixed for all 
potential parameters. Crossings between energy levels 
always take place when the potential parameter changes. 
It is therefore worthwhile to point out that the gains of 
stability for distinct structures vary, and the most sta- 
ble protein structure in one surrounding medium maybe 
no more the most stable one in another medium. The 
plots of the largest energy gap versus the parameter V 
are shown in Fig. O respectively for the three cases of the 
weights 7's discussed in the above. In order to show an 
apparent change for eye's view, we have set the value of 
the vertical axis in Fig. |3| to be the largest average gap 
minus 0.21V, Q.5V, and 0.6y for the cases (a), (b), and 
(c), respectively. In each case is there a critical value 
of V across which the plot transits from a strait line to 
another strait line. The critical values of V differ in dif- 
ferent cases, but the largest average gaps at the transition 
point take the same value Ss — 1.4137. 

We analyze all the sequences that design the 60 highly 
designable structures respectively. In the absence of 
medium, — = If = 0, the energy gaps Ss of 
those sequences range from 0.3 to 2.6 (see Fig. 0)). Al- 
most half of them have small energy gaps (around 0.3). 
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FIG. 4: The histogram for the number of sequences versus the energy gap for the 60 high designable structures in the absence 
of medium (left); and in the presence of medium jv = 7/8, 7b = 6/8,7^ = 0, V = 2.1(right). 



In the presence of medium, the energy gaps for most of 
the sequences with larger (over 1) energy gap rises as pa- 
rameter increases while that for the sequences with small 
energy gap does not rises apparently. For the cases (a) 
IV = 7/8, 7B = 6/8, 7F = 4/8, (b) 7y = 7/8, 7^ = 6/8, 
jF = 0, and (c) 7v = 7b = 1, 7f = 0, the increments in 
energy gaps are mainly 3V/8, 7V/8, and V respectively. 
Whereas, there are also a small portion of the sequences 
whose energy gaps decrease in the medium, e.g., 276 se- 
quences in the case 7y = 7/8, 7^ = 6/8, 7^ = 4/8. Con- 
sidering some particular structures among the 60 highly 
designable ones, we analyze the sequences that design 
them. The energy gap of the sequences with larger en- 
ergy gap will mostly increase when the sequence is placed 
in medium, which leads to the linear increment of aver- 
age gap. Our results also illustrate that the distribution 
shapes emerge similar for those three structures. In ad- 
dition, the total number of sequence in (b) is less than in 
(c), but there are much more sequences possessing large 
energy gap in (b) than in (c). 

In summary, our simple analysis of the average distri- 



bution of the number of hydrophobic monomers can in- 
terpret that the lowly designable structures possess small 
average gap. Our model study exhibits that the sur- 
face potential enhances the average gap of highly des- 
ignable structures, which implies that the highly des- 
ignable structures selected by nature are more stable in 
proper media than in "vacuum". Wc obtained that the 
energy gap of the sequences with larger energy gap will 
mostly increase when the sequence is placed in medium, 
which leads to the linear increment of average gap. We 
also noticed that there is a critical value for the parameter 
of the surface potential, which means that a most stable 
structure may be no longer the most stable one if the 
medium parameters changed. Since a lot of studies have 
shown that several properties of natural proteins can be 
captured by simple models, our discussion in above may 
motivate people to model the effect of medium on all the- 
oretical studies where the medium potential was ignored. 
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